Mining Ethnic Content Online with Additively Regularized Topic Models

نویسندگان

  • Murat Apishev
  • Sergei Koltcov
  • Olessia Koltsova
  • Sergey I. Nikolenko
  • Konstantin Vorontsov
چکیده

Social studies of the Internet have adopted large-scale text mining for unsupervised discovery of topics related to specific subjects. A recently developed approach to topic modeling, additive regularization of topic models (ARTM), provides fast inference and more control over the topics with a wide variety of possible regularizers than developing LDA extensions. We apply ARTM to mining ethnic-related content from Russian-language blogosphere, introduce a new combined regularizer, and compare models derived from ARTM with LDA. We show with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning

Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recently-proposed batch topic models—Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and von-Mises Fisher (vMF) mixture...

متن کامل

A Regularized Latent Semantic Indexing: A New Approach to Large Scale Topic Modeling

Topic modeling provides a powerful way to analyze the content of a collection of documents. It has become a popular tool in research areas such as text mining, information retrieval, natural language processing, and other related fields. In realworld applications, however, the usefulness of topic modeling is limited due to scalability issues. Scaling to larger document collections via paralleli...

متن کامل

Topic modeling for OLAP on multidimensional text databases: topic cube and its applications

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand,...

متن کامل

Survey on Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

A social networking service is an online service, platform, or site that focuses on facilitating the building of social networks or social relations among people who, for example, share interests, activities, backgrounds, or reallife connections. A social network service consists of a representation of each user, his/her social links, and a variety of additional services. Most social network se...

متن کامل

Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computación y Sistemas

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2016